227 research outputs found
Within-Speaker Features for Native Language Recognition in the Interspeech 2016 Computational Paralinguistics Challenge
The Interspeech 2016 Native Language recognition challenge was to identify the first language of 867 speakers from their spoken English. Effectively this was an L2 accent recognition task where the L1 was one of eleven languages. The lack of transcripts of the spontaneous speech recordings meant that the currently best performing accent recognition approach (ACCDIST) developed by the author could not be applied. Instead, the objectives of this study were to explore whether within-speaker features found to be effective in ACCDIST would also have value within a contemporary GMM-based accent recognition approach. We show that while Gaussian mean supervectors provide the best performance on this task, small gains may be had by fusing the mean supervector system with a system based on within-speaker Gaussian mixture distances
Using Web Audio To Deliver Interactive Speech Tools In The Browser
In 2014, the number of web pages delivered to tablets and smartphones overtook the number delivered to laptop and desktop computers, with a majority of users saying they prefer these new portable platforms over conventional computers for many tasks. This shift in device use provides both opportunities and challenges for providers of speech analysis tools, phonetic demonstrations and language teaching aids. It is an opportunity because web standards mean we can make our applications available to a wide audience through a single consistent programming architecture rather than writing for one particular computing platform. It is a challenge because tablets and smartphones are less powerful, require different programming skills and have different limitations in terms of user interface. In this article, I will show how interactive applications in Phonetics and Speech Science can be written to run in web browsers on any computing platform. These are native web applications, written in HTML, CSS and JavaScript that can capture, replay, display, process, and analyze audio using the Web Audio API without needing any plugins. I will describe - and give the URLs of - some demonstration applications. I will discuss some future opportunities in the area of collaborative research and some remaining challenges that arise from incompatibilities across browsers. My audience is teachers and students with intermediate web programming skills wanting to build custom speech displays, perform custom speech analysis or run speech audio experiments over the web
A Comparison of Human and Machine Estimation of Speaker Age
The estimation of the age of a speaker from his or her voice has both forensic and commercial applications. Previous studies have shown that human listeners are able to estimate the age of a speaker to within 10 years on average, while recent machine age estimation systems seem to show superior performance with average errors as low as 6 years. However the machine studies have used highly non-uniform test sets, for which knowledge of the age distribution offers considerable advantage to the system. In this study we compare human and machine performance on the same test data chosen to be uniformly distributed in age. We show that in this case human and machine accuracy is more similar with average errors of 9.8 and 8.6 years respectively, although if panels of listeners are consulted, human accuracy can be improved to a value closer to 7.5 years. Both human and machines have difficulty in accurately predicting the ages of older speakers
Two-level recognition of isolated word using neural nets
Describes a neural-net based isolated word recogniser that has a better performance on a standard multi-speaker database than the reference hidden Markov model recogniser. The complete neural net recogniser is formed from two parts: a front-end which transforms the complex acoustic specification of the speech into a simplified phonetic feature specification, and a whole-word discriminator net. Each level was trained separately, thus considerably reducing the time necessary to train the overall system
Two-level recognition of isolated word using neural nets
This paper describes a neural-net based isolated word recogniser that has a better performance on a standard multi-speaker database than our reference Hidden Markov Model recogniser. The complete neural net recogniser is formed from two parts: a front-end which transforms the complex acoustic specification of the speech into a simplified phonetic feature specification, and a whole-word discriminator net. Each level was trained separately, thus considerably reducing the time necessary to train the overall system
Predicting fatigue and psychophysiological test performance from speech for safety-critical environments
Automatic systems for estimating operator fatigue have application in safety-critical environments. A system which could estimate level of fatigue from speech would have application in domains where operators engage in regular verbal communication as part of their duties. Previous studies on the prediction of fatigue from speech have been limited because of their reliance on subjective ratings and because they lack comparison to other methods for assessing fatigue. In this paper, we present an analysis of voice recordings and psychophysiological test scores collected from seven aerospace personnel during a training task in which they remained awake for 60 h. We show that voice features and test scores are affected by both the total time spent awake and the time position within each subject’s circadian cycle. However, we show that time spent awake and time-of-day information are poor predictors of the test results, while voice features can give good predictions of the psychophysiological test scores and sleep latency. Mean absolute errors of prediction are possible within about 17.5% for sleep latency and 5–12% for test scores. We discuss the implications for the use of voice as a means to monitor the effects of fatigue on cognitive performance in practical applications
It Sounds Like You Have a Cold! Testing Voice Features for the Interspeech 2017 Computational Paralinguistics Cold Challenge
This paper describes an evaluation of four different voice feature sets for detecting symptoms of the common cold in speech as part of the Interspeech 2017 Computational Paralinguistics Challenge. The challenge corpus consists of 630 speakers in three partitions, of which approximately one third had a “severe” cold at the time of recording. Success on the task is measured in terms of unweighted average recall of cold/not-cold classification from short extracts of the recordings. In this paper we review previous voice features used for studying changes in health and devise four basic types of features for evaluation: voice quality features, vowel spectra features, modulation spectra features, and spectral distribution features. The evaluation shows that each feature set provides some useful information to the task, with features from the modulation spectrogram being most effective. Feature-level fusion of the feature sets shows small performance improvements on the development test set. We discuss the results in terms of the most suitable features for detecting symptoms of cold and address issues arising from the design of the challenge
X-ray emission from the Sombrero galaxy: discrete sources
We present a study of discrete X-ray sources in and around the
bulge-dominated, massive Sa galaxy, Sombrero (M104), based on new and archival
Chandra observations with a total exposure of ~200 ks. With a detection limit
of L_X = 1E37 erg/s and a field of view covering a galactocentric radius of ~30
kpc (11.5 arcminute), 383 sources are detected. Cross-correlation with Spitler
et al.'s catalogue of Sombrero globular clusters (GCs) identified from HST/ACS
observations reveals 41 X-rays sources in GCs, presumably low-mass X-ray
binaries (LMXBs). We quantify the differential luminosity functions (LFs) for
both the detected GC and field LMXBs, whose power-low indices (~1.1 for the
GC-LF and ~1.6 for field-LF) are consistent with previous studies for
elliptical galaxies. With precise sky positions of the GCs without a detected
X-ray source, we further quantify, through a fluctuation analysis, the GC LF at
fainter luminosities down to 1E35 erg/s. The derived index rules out a
faint-end slope flatter than 1.1 at a 2 sigma significance, contrary to recent
findings in several elliptical galaxies and the bulge of M31. On the other
hand, the 2-6 keV unresolved emission places a tight constraint on the field
LF, implying a flattened index of ~1.0 below 1E37 erg/s. We also detect 101
sources in the halo of Sombrero. The presence of these sources cannot be
interpreted as galactic LMXBs whose spatial distribution empirically follows
the starlight. Their number is also higher than the expected number of cosmic
AGNs (52+/-11 [1 sigma]) whose surface density is constrained by deep X-ray
surveys. We suggest that either the cosmic X-ray background is unusually high
in the direction of Sombrero, or a distinct population of X-ray sources is
present in the halo of Sombrero.Comment: 11 figures, 5 tables, ApJ in pres
Performance of the CMS Cathode Strip Chambers with Cosmic Rays
The Cathode Strip Chambers (CSCs) constitute the primary muon tracking device
in the CMS endcaps. Their performance has been evaluated using data taken
during a cosmic ray run in fall 2008. Measured noise levels are low, with the
number of noisy channels well below 1%. Coordinate resolution was measured for
all types of chambers, and fall in the range 47 microns to 243 microns. The
efficiencies for local charged track triggers, for hit and for segments
reconstruction were measured, and are above 99%. The timing resolution per
layer is approximately 5 ns
Performance and Operation of the CMS Electromagnetic Calorimeter
The operation and general performance of the CMS electromagnetic calorimeter
using cosmic-ray muons are described. These muons were recorded after the
closure of the CMS detector in late 2008. The calorimeter is made of lead
tungstate crystals and the overall status of the 75848 channels corresponding
to the barrel and endcap detectors is reported. The stability of crucial
operational parameters, such as high voltage, temperature and electronic noise,
is summarised and the performance of the light monitoring system is presented
- …